a b > 
o Normal Tumor : = 40} |. — 
g S| aoe oe ae 
7 #1000 ae 8% a 
GF a ae ee 
== ahs 8 80.5 
oOf£ ue “er 8 Oo 
5 < 500 wiht Spi > é 
5 Re hgy, o& : 
oe one . = 
” ~~ = 0.0 
> \¥ a >» \¥ a 
ew oO oP oO O oO 
c d 
) 1.00 is ; 
; 5 - 
A t. ee As —_—_. ares 10) be, 
5 0.75) “TE ae 2 + 
Cc © Q ®O ea 
S 0.50 aa 
32 = 
aS oO — 
mae OOO). * a0 he 
: ANI cist cciauttanieabiiaus 
E 0.00 . Site 
> » r » > ‘ 
Ow .) ox f 2 w .) on 
: 5 
y be-04 = re 
2 £ 2000 
5 6e-04 = 
ae g 
get") Te J oe S 
c eee | SE 
Yo 26-04 ae. w 2 1000 
= ey & 
” Oe+00 §& 500L/—---~~~~~~~~~~~~—. 
> \¥ a 
g ow -) 7 on Ps > 
a Tumor/normal pairing accuracy 
i= 1.00 vee 
© Self 
= 0.75 
oO 
EL 50 Self - best nonself 
= ag i 
2 
xs) 
B00 see oeeee ee 
> eS 


Supplementary Figure 1. Whole-genome sequencing (WGS) data QC. Among 172 paired 
tumor and normal samples from 86 dogs with published WGS data, a total of 72 samples from 36 
dogs passed our QC (Supplementary Data 1). Panels a-g are presented as described in panels a-c 


and e-h of Figure 1, respectively. a-b) n = 67, 4 and 15; e-d) n = 42, 4 and 15; and e) n = 36 (33 
normal), 4 (3 tumors) and 15 (0 normal samples) independent cases for matched normal and 
tumor samples of GLM, OM, and OSA respectively. f-g) = 33 and 3 independent cases of 
GLM and OM, respectively. Source data are provided as a Source Data file. 
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Supplementary Figure 2. Corroboration of our breed validation and prediction strategy. 

a. The WES dataset is divided into the discovery cohort, which consists of large sample size studies and has 362 dogs, and the 
validation cohort, which consists of small sample size studies and has 19 dogs (Supplementary Data 2). The discovery cohort has 
9 breeds with each having >10 dogs, which were used to identify breed-specific germline base substitution and small indel variants 
(Supplementary Data 2). VAF values of the identified 9 breed-specific variants (5,892 total; see Supplementary Data 2) were then 
used to cluster the animals from both the discovery and validation cohorts. The image is presented as described in Figure 2. 

b. Corroboration with the WGS dataset. Dogs from both WES and WGS datasets were clustered with VAF values of the 10 breed- 
specific variants (see Figure 2). In the “Data type” bar, the “WES(WGS)” label represents dogs that have both WES and WGS 
data but were clustered here solely based on their WES data. 

c. Corroboration with all 626 dogs from both WES and WGS datasets using the 10 breed-specific variants. All dogs with data that 
have passed QC shown in Figure | and Supplementary Figure 1 were clustered here, including dogs from mixed breeds or breeds 
with <10 animals. The image is presented as described in Figure 2. 

Source data are provided as a Source Data file. 
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Supplementary Figure 3. Somatic mutation discovery and filtering. 
a. Distributions of the six base substitution counts in each tumor of each study called by 
MuTect. Each study is represented by the tumor type and the institute name. MT: mammary 


tumor; GLM: glioma; BCL: B-cell lymphoma; TCL: T-cell lymphoma; OM: oral melanoma; 
OSA: osteosarcoma; HSA: hemangiosarcoma; UCL: unclassified. CUK: Catholic University 
of Korea; SNU: Seoul National University; JL: Jackson Laboratory; SI: Sanger Institute; BI: 

Broad Institute; UPenn: University of Pennsylvania. 

b. Distributions of the six base substitution counts in each tumor of each study called by 
MuTect followed by 5-step filtering! (see Methods). 

c. Distributions of the six base substitution counts in each tumor of each study called by 
MuTect, followed by the 5-step filtering! and paired-read strand orientation bias filtering (see 
Methods). 

d. Distributions of base substitution and small indel counts in each tumor of each study. Base 
substitutions were discovered as described in c, and small indels were discovered with 
Strelka (see Methods). The results indicate very few small indels compared to base 
substitutions. 

Source data are provided as a Source Data file. 
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Supplementary Figure 4. Comparison of somatic mutation findings between our study and 
the original publications. 

a-c. Mutations discovered at each of the three steps of our pipeline (see Supplementary Figure 
3a-c) were compared to those from the original publications. For each mutation in each sample, 
the genomic coordinate and the actual mutation, which are published only for mammary tumor 
(MT)? (a) and oral melanoma (OM)! (b), were compared. Distributions of identical and different 
mutation counts in each sample were plotted. As the original MT publication? used MuTect2 
instead of MuTect for mutation calling, we also conducted the analyses using MuTect2 and 
plotted the comparison (c). 

d. Examples of somatic mutations found only by our study (left two images) or by the 
original publication (right image). Images are screen shots from the IGV program, with the 
tumor name provided (e.g., DD00013) and the mutation indicated by an orange rectangle. 

Source data are provided as a Source Data file. 
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Cross-tumor / breed TMB differences (Figures 6a & 7a) 
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Supplementary Figure 5. Power calculation. 

a. One-sample power calculation is for the mutation detection power. The power of detecting a mutation within a tumor type or 
breed is estimated based on the mutation prevalence and the sample size. Simulated curves were generated as described in 
Methods. 

b. Two-sample Fisher exact test power. The power of each Fisher exact test shown in Figure 3b was calculated using: 1) the actual 
sample sizes of the two groups being compared; and 2) odds ratios as described in Methods. 

c. Two-sample Wilcoxon test power for TMB comparison shown in Figures 6a and 7a. Each power was calculated using: 1) the 
actual sample sizes of the two groups being compared; and 2) an effect size as described in Methods. 

d. Two-sample Wilcoxon test power for TMB-gene mutation association analysis shown in Figures 6b, 6c and 7b. The calculation is 
described in Methods. 

Source data are provided as a Source Data file. 
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Supplementary Figure 6. TMB at different sequence coverage of 30-50x, 50-100x and 

>100x. 

a. Distribution of tumor mean coverage in each WES study. n= 13, 169, 20, 2, 24, 23, 3, 21, 
31, 1, 16, 21, 5, 61, 5, 1, 65, 3, 9, 6, 22, 13, 6 and 2 independent tumors from left to right. 
Two-sided Wilcoxon tests were conducted as indicated; **: p = 0.007. 

b. TMB comparison at matched sequence coverage among tumor types. 1 = 0, 2, 3, 1,5, 0, 12 
5, 18, 33, 24, 21, 16, 61, 4, 24, 78, 105, 169, 23, 31, 21, 5, 74, 13, 223 and 113 independent 
tumors left to right. Two-sided Wilcoxon tests were conducted; NS (not significant): p = 
0.1296 and ****: p = 2.3678e-16 (left) and 7.3985e-18 (right). 

c. Distribution of tumor mean coverages among breeds within a study. The left plot indicates 
that only Schnauzer in MT has a coverage distribution differing from other breeds (**: p = 
0.002 from two-sided Fisher exact test). The right plot shows that Schnauzer has a higher 


> 


median TMB than Maltase at 100X coverage; n = 16, 8, 54 and 7 independent tumors left to 


right; **: p = 0.009 from two-sided Wilcoxon tests. However, both plots indicate that the 
sample size of Schnauzer is small. 
Source data are provided as a Source Data file. 
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Supplementary Figure 7. Tumor type comparison of TMB obtained by other mutation 
discovery tools, and dog- human TMB comparison. 

a. TMB value of each tumor calculated using somatic mutations called by MuTect2 (left), or at 
least two of three callers of MuTect2, Varscan2, and LoFreq (right). m = 202, 49, 55, 38, 71, 
78 and 49 independent tumors from left to right in both plots. Two-sided Wilcoxon tests 
were conducted, with **** representing p = 2.6e-62 and 4.2e-36 from left to right and fold- 
changes indicated. 

b. Dog-human TMB comparison for matched tumor types. Two-sided Wilcoxon test p values 
(**** from left to right representing p = 2e-16, 2e-16, 1.6e-15, 2e-7, le-10, 2.5e-5 and 6.1e-9 
respectively; *** representing p = 0.007) and fold changes are indicated for those with 
significantly different TMB. Abbreviations of tumor types are described in Figures 1,4 and 
6. n = 202, 1048, 49, 66, 514, 389, 55, 41, 38, 42, 71, 46, 78, 56, 49, 49 and 48 independent 
tumors for each column from left to right, respectively. 

Source data are provided as a Source Data file. 
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Supplementary Figure 8. S1 mutation signature is 7P53 mutation-independent, and 

noncoding mutation signature in WGS tumors. 

a. S1 mutation signature has a stronger association with TMB, compared to 7P53 mutation, 
within osteosarcoma in general (OSA all) and osteosarcoma Golden Retriever cases (OSA 
GR) (top). S1 is independent of 7P53 mutation (bottom). n = 39, 39, 67, 11, 34, 5, 33, 6, 14, 
11, 14, 11, 9, 5, 5 and 6 independent tumors from left to right. Two-sided Wilcoxon tests 
were conducted, with ***: p = 0.0009; ****: p = 2.2e-5; **: p = 0.003; ***: p = 0.0004; **: 
p = 0.003; ***: p = 0.0003; *: p = 0.02 and **: p = 0.009 from left to right. 

b. Noncoding mutation signatures detected in 36 tumors with WGS data passing our QC 
measures. The top two plots are presented as described in Figure 8a, while the bottom plot 
indicates the mutation signature distribution in each tumor, with respective breeds indicated. 

Source data are provided as a Source Data file. 
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